Internet Info 1997 December

home *** CD-ROM | disk | FTP | other *** search

/ Internet Info 1997 December / Internet_Info_CD-ROM_Walnut_Creek_December_1997.iso / ietf / urn / urn-archives / urn-ietf.archive.9611 / 000148_owner-urn-ietf _Wed Nov 13 14:57:11 1996.msg < prev next >

Wrap

Internet Message Format | 1997-02-19 | 6KB

Received: (from daemon@localhost) by services.bunyip.com (8.6.10/8.6.9) id OAA10723 for urn-ietf-out; Wed, 13 Nov 1996 14:57:11 -0500 Received: from mocha.bunyip.com (mocha.Bunyip.Com [192.197.208.1]) by services.bunyip.com (8.6.10/8.6.9) with SMTP id OAA10718 for <urn-ietf@services.bunyip.com>; Wed, 13 Nov 1996 14:56:58 -0500 Received: from josef.ifi.unizh.ch by mocha.bunyip.com with SMTP (5.65a/IDA-1.4.2b/CC-Guru-2b) id AA13771 (mail destined for urn-ietf@services.bunyip.com); Wed, 13 Nov 96 14:55:22 -0500 Received: from ifi.unizh.ch by josef.ifi.unizh.ch id <01921-0@josef.ifi.unizh.ch>; Wed, 13 Nov 1996 20:54:22 +0100 Subject: Re: [URN] Please avoid "URNs are" To: moore@cs.utk.edu (Keith Moore) Date: Wed, 13 Nov 1996 20:54:21 +0100 (MET) Cc: moore@cs.utk.edu, Harald.T.Alvestrand@uninett.no, Dirk.vanGulik@jrc.it, FisherM@is3.indy.tce.com, girod@LCS.MIT.EDU, tallen@fsc.fujitsu.com, urn-ietf@bunyip.com In-Reply-To: <199611131905.OAA15428@ig.cs.utk.edu> from "Keith Moore" at Nov 13, 96 02:05:06 pm Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Content-Length: 4344 From: Martin J Duerst <mduerst@ifi.unizh.ch> Message-Id: <"josef.ifi..262:13.10.96.19.54.32"@ifi.unizh.ch> Sender: owner-urn-ietf@services.bunyip.com Precedence: bulk Reply-To: Martin J Duerst <mduerst@ifi.unizh.ch> Errors-To: owner-urn-ietf@bunyip.com Keith Moore wrote: >> >> PLEASE avoid saying "URNs are XXX". Please use wording such >> >> as "URNs are noted on paper as XXX", "URNs are represented >> >> on computers in the following forms...", or "URNs represent >> >> characters from the following set..." or whatever. >> > >> >I'm not sure how much difference it makes. RFC 1737 requires >> >the following: >> > >> > o Single encoding: The encoding for presentation for people in clear >> > text, electronic mail and the like is the same as the encoding in >> > other transmissions. >> >> This requirement makes sense in the sense that the number of different >> representations and encoding should be as low as possible. > >I hope that "as low as possible" is equivalent to "one". Well, with EBCDIC and ASCII, you will certainly have two :-). >> But it does not say anything about how URNs are noted on paper. > >No it doesn't. But it seems reasonable to expect that people will >transcribe URNs from their screens to paper, and from that paper to >keyboard. The result of doing so had better produce the same URN. Of course. Modulo some problems of the kind of O/0 and l/1/I, it will, for all of i18n. >Also, we can't assume that the transcription of a URN from paper to >keyboard will be through an intelligent tool that understands how to >translate URNs from display format to some other format. You need a reasonably intelligent human user, of course. As for keyboarding tools, in the worst case you would need a Unicode book (cheaper than most keyboards themselves, but unfortunately heavy), a piece of paper, and a code table for the 16 hex digits. As an alternative, I suggest some nice site programmed in Java, or whatever. >A URN could >certainly be sent via email, transcribed to paper, and typed back into >plain text email by someone else. For Japanese postal transfer accounts, that would work without problems, even if the Kanji character appears as such, and not mutilated to some ASCII-compatible form. The only thing you have to make sure is that if you cut/paste something from/to your URN field in a browser, the underlying encoding is changed appropriately (from ISO-2022-JP in the case of email to UTF-8, with or without the additional %HH depending on what is needed). Browsers such as Netscape do conversions correctly in other instances, so I have no doubt they would get it right as soon as they know what to do. >For grandfathering other URN schemes, non-universal characters will >need to be converted into sequences of universal characters. International standards use the term "universal character set" to denote an encompassing set of characters. In order to avoid misunderstanding by less informed readers, why not use "greatest-common-denominator characters" or GCD characters for short. >That way >there can be a well-defined conversion from {j.random.naming.scheme} >to URN, but only one format for the URN once it's converted. It's >fine if smart software recognizes certain types of URNs and undoes the >conversion, so long as the unconverted form is displayed as an >identifier from {j.random.naming.scheme} and NOT as a URN. We have been at this point before. It is mainly a naming issue, and to some extent a protocol encoding issue. You are requesting that things that don't appear %HH-encoded, but with the origina characters they represent, are not called URNs. You are also requesting that there be no 8-bit transfer form. >As for URNs encoded in EBCDIC: we should probably define URNs as >sequences of characters, Here wording really matters. To be clear, it is best to say that URNs are *represented by* sequences of characters. They *are* abstract, permanent identifiers or something along these lines. >which can be represented in any character >encoding scheme, so long as: > >+ that encoding scheme is clearly labeled (e.g. MIME charset) whenever >multiple encodings can appear in the same context, and > >+ there is a unique encoding in that scheme for each character that >can appear in a URN. > >So URN character sequences could be encoded in ASCII or EBCDIC or >UCS-32 or whatever and still be URNs. A small detail: What you mean by UCS-32 is called UCS-4. It might have been called UCS-31, but not UCS-32, because in the 4-byte form, the uppermost bit (sign bit) is not used. Regards, Martin.